Pseudorehearsal in value function approximation
نویسندگان
چکیده
Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Qlearning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the rehearsal parameters.
منابع مشابه
Pseudorehearsal in actor-critic agents with neural network function approximation
Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decre...
متن کاملPseudorehearsal in actor-critic agents
—Catastrophic forgetting has a serious impact in reinforcement learning, as the data distribution is generally sparse and non-stationary over time. The purpose of this study is to investigate whether pseudorehearsal can increase performance of an actor-critic agent with neural-network based policy selection and function approximation in a pole balancing task and compare different pseudorehearsa...
متن کاملCatastrophic Forgetting, Rehearsal and Pseudorehearsal
This paper reviews the problem of catastrophic forgetting (the loss or disruption of previously learned information when new information is learned) in neural networks, and explores rehearsal mechanisms (the retraining of some of the previously learned information as the new information is added) as a potential solution. We replicate some of the experiments described by Ratcliff (1990), includi...
متن کاملCatastrophic Forgetting and the Pseudorehearsal Solution in Hopfield-type Networks
Pseudorehearsal is a mechanism proposed by Robins which alleviates catastrophic forgetting in multi-layer perceptron networks. In this paper, we extend the exploration of pseudorehearsal to a Hop® eld-type net. The same general principles apply: old information can be rehearsed if it is available, and if it is not available, then generating and rehearsing approximations of old information that ...
متن کاملMinimizing a General Penalty Function on a Single Machine via Developing Approximation Algorithms and FPTASs
This paper addresses the Tardy/Lost penalty minimization on a single machine. According to this penalty criterion, if the tardiness of a job exceeds a predefined value, the job will be lost and penalized by a fixed value. Besides its application in real world problems, Tardy/Lost measure is a general form for popular objective functions like weighted tardiness, late work and tardiness with reje...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1703.07075 شماره
صفحات -
تاریخ انتشار 2017